Goto

Collaborating Authors

 unit cell


Materium: An Autoregressive Approach for Material Generation

Dobberstein, Niklas, Hamaekers, Jan

arXiv.org Artificial Intelligence

We present Materium: an autoregressive transformer for generating crystal structures that converts 3D material representations into token sequences. These sequences include elements with oxidation states, fractional coordinates and lattice parameters. Unlike diffusion approaches, which refine atomic positions iteratively through many denoising steps, Materium places atoms at precise fractional coordinates, enabling fast, scalable generation. With this design, the model can be trained in a few hours on a single GPU and generate samples much faster on GPUs and CPUs than diffusion-based approaches. The model was trained and evaluated using multiple properties as conditions, including fundamental properties, such as density and space group, as well as more practical targets, such as band gap and magnetic density. In both single and combined conditions, the model performs consistently well, producing candidates that align with the requested inputs.


OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction

Jin, Emily, Nica, Andrei Cristian, Galkin, Mikhail, Rector-Brooks, Jarrid, Lee, Kin Long Kelvin, Miret, Santiago, Arnold, Frances H., Bronstein, Michael, Bose, Avishek Joey, Tong, Alexander, Liu, Cheng-Hao

arXiv.org Artificial Intelligence

Accurately predicting experimentally-realizable 3D molecular crystal structures from their 2D chemical graphs is a long-standing open challenge in computational chemistry called crystal structure prediction (CSP). Efficiently solving this problem has implications ranging from pharmaceuticals to organic semiconductors, as crystal packing directly governs the physical and chemical properties of organic solids. In this paper, we introduce OXtal, a large-scale 100M parameter all-atom diffusion model that directly learns the conditional joint distribution over intramolecular conformations and periodic packing. To efficiently scale OXtal, we abandon explicit equivariant architectures imposing inductive bias arising from crystal symmetries in favor of data augmentation strategies. We further propose a novel crystallization-inspired lattice-free training scheme, Stoichiometric Stochastic Shell Sampling ($S^4$), that efficiently captures long-range interactions while sidestepping explicit lattice parametrization -- thus enabling more scalable architectural choices at all-atom resolution. By leveraging a large dataset of 600K experimentally validated crystal structures (including rigid and flexible molecules, co-crystals, and solvates), OXtal achieves orders-of-magnitude improvements over prior ab initio machine learning CSP methods, while remaining orders of magnitude cheaper than traditional quantum-chemical approaches. Specifically, OXtal recovers experimental structures with conformer $\text{RMSD}_1<0.5$ Å and attains over 80\% packing similarity rate, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.


All that structure matches does not glitter

Martirossyan, Maya M., Egg, Thomas, Hoellmer, Philipp, Karypis, George, Transtrum, Mark, Roitberg, Adrian, Liu, Mingjie, Hennig, Richard G., Tadmor, Ellad B., Martiniani, Stefano

arXiv.org Artificial Intelligence

Generative models for materials, especially inorganic crystals, hold potential to transform the theoretical prediction of novel compounds and structures. Advancement in this field depends on robust benchmarks and minimal, information-rich datasets that enable meaningful model evaluation. This paper critically examines common datasets and reported metrics for a crystal structure prediction task$\unicode{x2014}$generating the most likely structures given the chemical composition of a material. We focus on three key issues: First, materials datasets should contain unique crystal structures; for example, we show that the widely-utilized carbon-24 dataset only contains $\approx$40% unique structures. Second, materials datasets should not be split randomly if polymorphs of many different compositions are numerous, which we find to be the case for the perov-5 and MP-20 datasets. Third, benchmarks can mislead if used uncritically, e.g., reporting a match rate metric without considering the structural variety exhibited by identical building blocks. To address these oft-overlooked issues, we introduce several fixes. We provide revised versions of the carbon-24 dataset: one with duplicates removed, one deduplicated and split by number of atoms $N$, one with enantiomorphs, and two containing only identical structures but with different unit cells. We also propose new splits for datasets with polymorphs, ensuring that polymorphs are grouped within each split subset, setting a more sensible standard for benchmarking model performance. Finally, we present METRe and cRMSE, new model evaluation metrics that can correct existing issues with the match rate metric.


Reconfigurable Auxetic Devices (RADs) for Robotic Surface Manipulation

Miske, Jacob, Maya, Ahyan, Inkiad, Ahnaf, Lipton, Jeffrey Ian

arXiv.org Artificial Intelligence

Robotic surfaces traditionally use materials with a positive Poisson's ratio to push and pull on a manipulation interface. Auxetic materials with a negative Poisson's ratio may expand in multiple directions when stretched and enable conformable interfaces. Here we demonstrate reconfigurable auxetic lattices for robotic surface manipulation. Our approach enables shape control through reconfigurable locking or embedded servos that underactuate an auxetic lattice structure. Variable expansion of local lattice areas is enabled by backlash between unit cells. Demonstrations of variable surface conformity are presented with characterization metrics. Experimental results are validated against a simplified model of the system, which uses an activation function to model intercell coupling with backlash. Reconfigurable auxetic structures are shown to achieve manipulation via variable surface contraction and expansion. This structure maintains compliance with backlash in contrast with previous work on auxetics, opening new opportunities in adaptive robotic structures for surface manipulation tasks.


Generative models for crystalline materials

Metni, Houssam, Ruple, Laura, Walters, Lauren N., Torresi, Luca, Teufel, Jonas, Schopmans, Henrik, Östreicher, Jona, Zhang, Yumeng, Neubert, Marlen, Koide, Yuri, Steiner, Kevin, Link, Paul, Bär, Lukas, Petrova, Mariana, Ceder, Gerbrand, Friederich, Pascal

arXiv.org Artificial Intelligence

Understanding structure-property relationships in materials is fundamental in condensed matter physics and materials science. Over the past few years, machine learning (ML) has emerged as a powerful tool for advancing this understanding and accelerating materials discovery. Early ML approaches primarily focused on constructing and screening large material spaces to identify promising candidates for various applications. More recently, research efforts have increasingly shifted toward generating crystal structures using end-to-end generative models. This review analyzes the current state of generative modeling for crystal structure prediction and \textit{de novo} generation. It examines crystal representations, outlines the generative models used to design crystal structures, and evaluates their respective strengths and limitations. Furthermore, the review highlights experimental considerations for evaluating generated structures and provides recommendations for suitable existing software tools. Emerging topics, such as modeling disorder and defects, integration in advanced characterization, and incorporating synthetic feasibility constraints, are explored. Ultimately, this work aims to inform both experimental scientists looking to adapt suitable ML models to their specific circumstances and ML specialists seeking to understand the unique challenges related to inverse materials design and discovery.


PRISM: Periodic Representation with multIscale and Similarity graph Modelling for enhanced crystal structure property prediction

Solé, Àlex, Mosella-Montoro, Albert, Cardona, Joan, Aravena, Daniel, Gómez-Coca, Silvia, Ruiz, Eliseo, Ruiz-Hidalgo, Javier

arXiv.org Artificial Intelligence

Crystal structures are characterised by repeating atomic patterns within unit cells across three-dimensional space, posing unique challenges for graph-based representation learning. Current methods often overlook essential periodic boundary conditions and multiscale interactions inherent to crystalline structures. In this paper, we introduce PRISM, a graph neural network framework that explicitly integrates multiscale representations and periodic feature encoding by employing a set of expert modules, each specialised in encoding distinct structural and chemical aspects of periodic systems. Extensive experiments across crystal structure-based benchmarks demonstrate that PRISM improves state-of-the-art predictive accuracy, significantly enhancing crystal property prediction.


A Bio-inspired Redundant Sensing Architecture

Anh Tuan Nguyen, Jian Xu, Zhi Yang

Neural Information Processing Systems

Sensing is the process of deriving signals from the environment that allows artificial systems to interact with the physical world. The Shannon theorem specifies the maximum rate at which information can be acquired [1]. However, this upper bound is hard to achieve in many man-made systems. The biological visual systems, on the other hand, have highly efficient signal representation and processing mechanisms that allow precise sensing. In this work, we argue that redundancy is one of the critical characteristics for such superior performance.




Why Physics Still Matters: Improving Machine Learning Prediction of Material Properties with Phonon-Informed Datasets

Benítez, Pol, López, Cibrán, Saucedo, Edgardo, Mizoguchi, Teruyasu, Cazorla, Claudio

arXiv.org Artificial Intelligence

Machine learning (ML) methods have become powerful tools for predicting material properties with near first-principles accuracy and vastly reduced computational cost. However, the performance of ML models critically depends on the quality, size, and diversity of the training dataset. In materials science, this dependence is particularly important for learning from low-symmetry atomistic configurations that capture thermal excitations, structural defects, and chemical disorder, features that are ubiquitous in real materials but underrepresented in most datasets. The absence of systematic strategies for generating representative training data may therefore limit the predictive power of ML models in technologically critical fields such as energy conversion and photonics. In this work, we assess the effectiveness of graph neural network (GNN) models trained on two fundamentally different types of datasets: one composed of randomly generated atomic configurations and another constructed using physically informed sampling based on lattice vibrations. As a case study, we address the challenging task of predicting electronic and mechanical properties of a prototypical family of optoelectronic materials under realistic finite-temperature conditions. We find that the phonons-informed model consistently outperforms the randomly trained counterpart, despite relying on fewer data points. Explainability analyses further reveal that high-performing models assign greater weight to chemically meaningful bonds that control property variations, underscoring the importance of physically guided data generation. Overall, this work demonstrates that larger datasets do not necessarily yield better GNN predictive models and introduces a simple and general strategy for efficiently constructing high-quality training data in materials informatics.